Hybridized Dimensionality Reduction Method for Machine Learning based Web Pages Classification

نویسندگان

چکیده

Feature space high dimensionality is a well-known problem in text classification and web mining domains, it caused mainly by the large number of vocabularies contained within documents. Several methods were applied to select most useful important features over years; however, performance such still improvable from different aspects as computational cost accuracy. This research presents an enhanced cosine similarity-based hybridization two efficient feature selection for higher performance. The reduced sets are generated using Random Projection (RP) Principal Component Analysis (PCA) methods, individually, then hybridized based on similarity values between features’ vectors. proposed method terms accuracy F-measure was tested dataset pages several term weighting schemes. As compared relevant results show significantly f-measure less set size. Index Terms— Cosine similarity, Dimensionality Reduction, selection, PCA, Projection.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction for Colour Based Pixel Classification

In digital images, providing classification based on colour, hue or spectral angle is a problem usually solved by combining a variety of pre-processing steps, as well as object wise classifiers. We have developed a method for transforming colour or multispectral image data to a 1D colour histogram with respect to the digital characteristics of intensity measurements. Classification is then redu...

متن کامل

Georeferencing Semi-Structured Place-Based Web Resources Using Machine Learning

In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georefer...

متن کامل

Approach for Dimensionality Reduction in Web Page Classification

Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant te...

متن کامل

Building an asynchronous web-based tool for machine learning classification

Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not bee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ?????? ???????? ?????? ???????? ?????????? ???????? ??????

سال: 2022

ISSN: ['2617-3352', '1811-9212']

DOI: https://doi.org/10.33103/uot.ijccce.22.3.9